Clustering and Annotation of integrated scRNA Bone Marrow samples using Seurat

After the Integration and dimension reduction in the previous Worksheet, the integrated samples were clustered with SLM algotithm (an improved Louvain algorithm). The cluster resolution was selected after evaluating clustering and annotation of several runs and compare the results with the original publication. Annotation was performed manually based on a given Marker set and evaluation of highly expressed markers.

1 Preparations and Data

1.1 Required Packages

library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(Seurat)
library(ggplot2)

packageVersion("dplyr")
[1] '0.8.99.9003'
packageVersion("Seurat")
[1] '3.1.5'

###1.2 Load Data from previous worksheet

bmmc_all <- readRDS(file ="./StoredRObj/bmmc_PreProc_Ref2.rds")
DefaultAssay(object = bmmc_all) <- "integrated"

2 Clustering

Seurat’s cluster algorithm are based on maximizing the modularity to detect communitys. In the first step the SNN graph is construsted

bmmc_all <- FindNeighbors(bmmc_all, reduction = "pca", dims = 1:20)
Computing nearest neighbor graph
Computing SNN

Then the Clusters are calculated based on the given SNN and a function to increase modularity. The cluster function was optimized to gain as much annotable clusters as possible while keeping the amount of unannotated low (resolution = 0.3). The SLM algorihtm was selected.

bmmc_all <- FindClusters(bmmc_all, algorithm = 3, resolution = 0.3, random.seed = 19950927)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 76645
Number of edges: 3055173

Running smart local moving algorithm...
Maximum modularity in 10 random starts: 0.9611
Number of communities: 30
Elapsed time: 116 seconds
8 singletons identified. 22 final clusters.
DimPlot(bmmc_all, reduction = "umap", label = TRUE, pt.size = 0.2, label.size = 10) + NoLegend()
Warning: Using `as.character()` on a quosure is deprecated as of rlang 0.3.0.
Please use `as_label()` or `as_name()` instead.
This warning is displayed once per session.

3.0 Marker selection and visualization

A given marker set of in total 55 markers was used to evaluate and annotate the clusers. All markers: (“AIGLC3”,“AVP”,“CCL5”,“CCR7”,“CD14”,“CD19”,“CD33”,“CD34”,“CD38”,“CD3D”,“CD3E”,“CD3G”,“CD4”,“CD74”,“CD79”,“CD79A”,“CD79B”,“CD8A”,“CD8B”,“CSF3R”,“CST3”,“DC74”,“DNTT”,“ELANE”,“FCER1A”,“FCGR3A”,“GATA1”,“GNLY”,“GYPA”,“HBA1”,“HBB”,“HBD”,“IGLC3”,“IL2RA”,“IL3RA”,“IL7R”,“JCHAIN”,“LYZ”,“MPO”,“MS4A1”,“MS4A7”,“MZB1”,“NCAM1”,“NKG7”,“PAX5”,“PF4”,“PLD4”,“PPBP”,“REXO2”,“RHAG”,“S100A9”,“SLC4A1”,“SOX4”,“SPI1”,“TCL1A”).

The following picture shows the violin and dim Plot graphs for HBA1 (representing Eukaryotes), CD3D (representing T-Cells), CD79A (B-cells), AVP (HSPC), CD34 (precursor and HSPC), MS4A7 (Monocytes), GNLY (TK-cells), FCER1A (DC) and PPBP (Megakaryocytes - only in DimPlot).

VlnPlot(bmmc_all, features = c( "AVP","HBA1","CD3D","CD79A", "CD34", "MS4A7", "GNLY", "FCER1A"), assay = "integrated",group.by = "seurat_clusters", ncol = 2, pt.size = 0.1) + NoLegend()
Warning: Could not find CD3D in the default search locations, found in RNA assay
instead

FeaturePlot(bmmc_all, features = c("AVP","HBA1","CD3D","CD79A", "CD34", "MS4A7", "GNLY", "FCER1A", "PPBP"), reduction = "umap", ncol =3)
Warning: Could not find CD3D in the default search locations, found in RNA assay
instead

Additionally the top marker for each cluster were identified and used to validate and improve annotation. The following command identifies the top markers of each cluster compared to all other clusters. In specific situations the top distinguishing markers to the neighboring clusters were evaluates (FindMarker).

bmmc_all.markers <- FindAllMarkers(bmmc_all, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
Calculating cluster 0
Calculating cluster 1
Calculating cluster 2
Calculating cluster 3
Calculating cluster 4
Calculating cluster 5
Calculating cluster 6
Calculating cluster 7
Calculating cluster 8
Calculating cluster 9
Calculating cluster 10
Calculating cluster 11
Calculating cluster 12
Calculating cluster 13
Calculating cluster 14
Calculating cluster 15
Calculating cluster 16
Calculating cluster 17
Calculating cluster 18
Calculating cluster 19
Calculating cluster 20
Calculating cluster 21
bmmc_all.markers %>% group_by(cluster) %>% top_n(n = 3, wt = avg_logFC)
# A tibble: 66 x 7
# Groups:   cluster [22]
   p_val avg_logFC pct.1 pct.2 p_val_adj cluster gene  
   <dbl>     <dbl> <dbl> <dbl>     <dbl> <fct>   <chr> 
 1     0     0.676 0.978 0.924         0 0       LDHB  
 2     0     0.642 0.932 0.861         0 0       TRBC1 
 3     0     0.553 0.928 0.786         0 0       CD27  
 4     0     0.748 0.884 0.845         0 1       KLRB1 
 5     0     0.584 0.932 0.872         0 1       TRBC1 
 6     0     0.545 0.888 0.818         0 1       LMNA  
 7     0     2.38  0.998 0.865         0 2       CCL5  
 8     0     1.85  0.997 0.882         0 2       NKG7  
 9     0     1.61  0.974 0.837         0 2       GZMH  
10     0     4.05  0.999 0.813         0 3       S100A9
# … with 56 more rows

The top Markers can also be visualized in a Heatmap:

top10 <- bmmc_all.markers %>% group_by(cluster) %>% top_n(n = 10, wt = avg_logFC)
DoHeatmap(bmmc_all, features = top10$gene) + NoLegend()

4.0 Annotation

Based on the marker genes the clusters were annotated. Cluster 8 and 12 were merged as “Late Erythrocytes”. Two clusters could not be properly annotated (cluster 19 and 21). Cluster 19 showed similiar profiles than T-Cells and might be a precursor cell-type, while cluster 21 was similiar to Monocytes/dendritic cells but may also include duplets

new.cluster.ids <- c("CD4 Naive and regulatory T-Cells", "CD4 Memory T-Cells ", "CD8 Memory T-Cells", "CD14 Monocytes", "Mature B-Cells",   "CD8 Naive T-cClls", "NK Cells","Late Erythroid Precursors", "Late Erythrocytes", "Early Erythroid Precursors", "Monocyte Progenitor Cells", "CD16 Monocytes","Late Erythrocytes", "B-Cell Progenitors", "Hematopoietic stem and progenitor cells (HSPC)",  "Early Erythrocytes","Conventional Dendritic Cells (cDC)", "B-cells (Plasma cell)", "Plasmacytoid dendritic cells (pDC)", "Undefined (T-Cell progenitors?)", "Megakaryocytes/Platelets",  "Undefined (Duplets?)")
names(new.cluster.ids) <- levels(bmmc_all)
bmmc_all <- RenameIdents(bmmc_all, new.cluster.ids)
bmmc_all@meta.data$celltype <- bmmc_all@active.ident
DimPlot(bmmc_all, reduction = "umap", label = TRUE, pt.size = 0.5, label.size = 10) + NoLegend()

Proportions of each donor sample for each celltype in percent

options("digits"=2)
prop.table(table(bmmc_all$orig.ident, bmmc_all$celltype),1)*100
                
                 CD4 Naive and regulatory T-Cells CD4 Memory T-Cells 
  A.GSM3396161                             16.484               8.153
  B.GSM3396162                             28.461              11.876
  C1.GSM3396163                            17.811               6.364
  C2.GSM3396165                            18.747               7.530
  Ck.GSM3396164                            25.296              12.164
  E.GSM3396166                             29.006               8.564
  F.GSM3396167                             33.214              12.178
  G.GSM3396168                              7.167               4.605
  H.GSM3396169                             25.173              12.918
  J.GSM3396170                             29.210               6.353
  K.GSM3396171                             37.557              18.081
  L.GSM3396172                             32.219              12.930
  M.GSM3396173                             21.797              10.728
  N.GSM3396174                             22.997              11.232
  O.GSM3396175                             22.158              12.367
  P.GSM3396176                             16.826               7.665
  Q.GSM3396177                             14.122               8.621
  R.GSM3396178                             23.502               7.125
  S1.GSM3396179                            24.754               9.322
  S2.GSM3396181                            22.766              10.803
  Sk1.GSM3396180                           24.264               9.543
  Sk2.GSM3396182                           26.014              10.764
  T.GSM3396183                             21.858              11.336
  U.GSM3396184                             23.431              17.360
  W.GSM3396185                             34.831               8.604
                
                 CD8 Memory T-Cells CD14 Monocytes Mature B-Cells
  A.GSM3396161                8.118         12.407          1.772
  B.GSM3396162                3.443          7.941          3.584
  C1.GSM3396163               1.969          7.555          6.273
  C2.GSM3396165               2.896          6.951          6.266
  Ck.GSM3396164              10.549          3.122         13.671
  E.GSM3396166                4.190          4.972          9.162
  F.GSM3396167               10.712          7.331          4.668
  G.GSM3396168                1.003          1.857          0.706
  H.GSM3396169               16.791          5.726         11.203
  J.GSM3396170                2.992          5.613          5.613
  K.GSM3396171               18.181          2.378          1.367
  L.GSM3396172                7.819          8.431          6.500
  M.GSM3396173                5.056         10.044          3.246
  N.GSM3396174               24.178         12.321          8.198
  O.GSM3396175               16.837         18.944          4.747
  P.GSM3396176                2.186          5.958          1.138
  Q.GSM3396177               11.741          8.621         10.673
  R.GSM3396178               16.519         14.605          4.254
  S1.GSM3396179               4.039         14.086          4.454
  S2.GSM3396181               4.442         13.579          4.341
  Sk1.GSM3396180             16.548         10.152         12.386
  Sk2.GSM3396182             14.431         11.497         14.215
  T.GSM3396183                6.703          8.502          7.491
  U.GSM3396184                5.657          9.507         10.514
  W.GSM3396185                6.485         14.222          5.522
                
                 CD8 Naive T-cClls NK Cells Late Erythroid Precursors
  A.GSM3396161               2.978    2.233                     3.722
  B.GSM3396162               5.903    1.687                     5.868
  C1.GSM3396163              2.610    1.282                    10.027
  C2.GSM3396165              3.581    1.106                     9.953
  Ck.GSM3396164              2.260    3.875                     5.597
  E.GSM3396166               6.814    1.980                     8.425
  F.GSM3396167               4.608    3.321                     3.920
  G.GSM3396168               4.382    0.780                    20.720
  H.GSM3396169               5.643    6.003                     2.683
  J.GSM3396170              10.756    1.580                     4.874
  K.GSM3396171               3.517   11.404                     1.011
  L.GSM3396172               3.132   11.894                     1.861
  M.GSM3396173               3.211    1.811                     8.063
  N.GSM3396174               2.084    4.724                     1.714
  O.GSM3396175               5.811    5.662                     0.745
  P.GSM3396176               2.784    1.018                     8.982
  Q.GSM3396177               2.135    4.844                    14.039
  R.GSM3396178               5.530    6.310                     6.523
  S1.GSM3396179              3.521    3.366                     8.752
  S2.GSM3396181              4.341    3.635                     7.723
  Sk1.GSM3396180             2.843   13.503                     1.015
  Sk2.GSM3396182             3.171   10.246                     1.122
  T.GSM3396183              11.237    2.292                     3.746
  U.GSM3396184               6.432    1.731                     2.222
  W.GSM3396185               9.406    3.146                     3.082
                
                 Late Erythrocytes Early Erythroid Precursors
  A.GSM3396161              10.280                      3.580
  B.GSM3396162               1.441                      6.395
  C1.GSM3396163             11.767                      4.762
  C2.GSM3396165              9.321                      5.003
  Ck.GSM3396164              7.966                      2.045
  E.GSM3396166               5.709                      3.591
  F.GSM3396167               2.424                      2.693
  G.GSM3396168              29.967                      6.350
  H.GSM3396169               3.734                      0.913
  J.GSM3396170              10.151                      3.563
  K.GSM3396171               1.353                      0.441
  L.GSM3396172               1.107                      1.272
  M.GSM3396173              10.181                      5.501
  N.GSM3396174               2.131                      1.042
  O.GSM3396175               0.490                      1.128
  P.GSM3396176              25.000                      6.826
  Q.GSM3396177               5.419                      3.777
  R.GSM3396178               3.438                      2.836
  S1.GSM3396179              4.816                      4.972
  S2.GSM3396181              5.452                      4.038
  Sk1.GSM3396180             0.711                      2.132
  Sk2.GSM3396182             0.712                      1.057
  T.GSM3396183               8.526                      2.464
  U.GSM3396184               5.657                      1.808
  W.GSM3396185               0.546                      2.022
                
                 Monocyte Progenitor Cells CD16 Monocytes B-Cell Progenitors
  A.GSM3396161                       7.586          2.517              3.651
  B.GSM3396162                       4.216          2.108              4.919
  C1.GSM3396163                      4.945          1.328              6.593
  C2.GSM3396165                      4.055          0.948              8.004
  Ck.GSM3396164                      0.969          0.861              3.875
  E.GSM3396166                       2.026          0.875              2.624
  F.GSM3396167                       1.287          5.925              1.586
  G.GSM3396168                       2.191          0.186              1.894
  H.GSM3396169                       1.355          1.411              0.609
  J.GSM3396170                       3.261          2.050              4.975
  K.GSM3396171                       0.470          1.580              0.285
  L.GSM3396172                       2.049          4.192              0.989
  M.GSM3396173                       2.426          2.460              3.724
  N.GSM3396174                       0.787          4.053              0.556
  O.GSM3396175                       1.852          3.576              0.532
  P.GSM3396176                       5.150          0.988              1.826
  Q.GSM3396177                       0.903          2.791              4.680
  R.GSM3396178                       2.269          1.382              0.142
  S1.GSM3396179                      1.968          3.263              1.450
  S2.GSM3396181                      2.120          2.322              1.514
  Sk1.GSM3396180                     0.406          2.132              1.218
  Sk2.GSM3396182                     0.431          2.351              0.475
  T.GSM3396183                       2.686          1.824              2.908
  U.GSM3396184                       3.332          2.092              2.325
  W.GSM3396185                       3.467          1.445              1.862
                
                 Hematopoietic stem and progenitor cells (HSPC)
  A.GSM3396161                                            6.381
  B.GSM3396162                                            5.552
  C1.GSM3396163                                           4.258
  C2.GSM3396165                                           3.370
  Ck.GSM3396164                                           0.861
  E.GSM3396166                                            3.315
  F.GSM3396167                                            1.197
  G.GSM3396168                                            3.788
  H.GSM3396169                                            0.858
  J.GSM3396170                                            2.521
  K.GSM3396171                                            0.584
  L.GSM3396172                                            1.178
  M.GSM3396173                                            2.016
  N.GSM3396174                                            0.556
  O.GSM3396175                                            1.405
  P.GSM3396176                                            4.820
  Q.GSM3396177                                            1.067
  R.GSM3396178                                            1.276
  S1.GSM3396179                                           2.693
  S2.GSM3396181                                           3.130
  Sk1.GSM3396180                                          0.508
  Sk2.GSM3396182                                          0.496
  T.GSM3396183                                            1.355
  U.GSM3396184                                            1.679
  W.GSM3396185                                            2.183
                
                 Early Erythrocytes Conventional Dendritic Cells (cDC)
  A.GSM3396161                2.340                              3.758
  B.GSM3396162                0.808                              3.162
  C1.GSM3396163               5.174                              3.480
  C2.GSM3396165               4.371                              3.897
  Ck.GSM3396164               4.306                              0.646
  E.GSM3396166                3.315                              1.934
  F.GSM3396167                0.598                              1.706
  G.GSM3396168               10.434                              0.520
  H.GSM3396169                1.604                              1.383
  J.GSM3396170                1.849                              2.353
  K.GSM3396171                0.513                              0.513
  L.GSM3396172                0.542                              1.460
  M.GSM3396173                4.681                              1.640
  N.GSM3396174                1.181                              0.950
  O.GSM3396175                0.149                              2.384
  P.GSM3396176                4.192                              2.096
  Q.GSM3396177                3.284                              1.478
  R.GSM3396178                2.304                              1.028
  S1.GSM3396179               1.968                              1.605
  S2.GSM3396181               2.776                              1.817
  Sk1.GSM3396180              0.203                              0.711
  Sk2.GSM3396182              0.173                              0.992
  T.GSM3396183                2.711                              2.661
  U.GSM3396184                1.188                              2.299
  W.GSM3396185                0.225                              1.862
                
                 B-cells (Plasma cell) Plasmacytoid dendritic cells (pDC)
  A.GSM3396161                   1.453                              1.914
  B.GSM3396162                   1.054                              1.230
  C1.GSM3396163                  2.015                              1.603
  C2.GSM3396165                  1.948                              1.896
  Ck.GSM3396164                  1.507                              0.323
  E.GSM3396166                   2.348                              1.013
  F.GSM3396167                   1.346                              0.598
  G.GSM3396168                   2.971                              0.371
  H.GSM3396169                   0.913                              0.664
  J.GSM3396170                   0.739                              1.412
  K.GSM3396171                   0.128                              0.199
  L.GSM3396172                   0.565                              0.801
  M.GSM3396173                   2.597                              0.615
  N.GSM3396174                   0.440                              0.394
  O.GSM3396175                   0.298                              0.490
  P.GSM3396176                   1.228                              1.048
  Q.GSM3396177                   0.985                              0.411
  R.GSM3396178                   0.248                              0.425
  S1.GSM3396179                  1.916                              1.139
  S2.GSM3396181                  2.221                              1.262
  Sk1.GSM3396180                 1.320                              0.406
  Sk2.GSM3396182                 0.518                              0.777
  T.GSM3396183                   0.542                              0.789
  U.GSM3396184                   1.266                              0.852
  W.GSM3396185                   0.225                              0.578
                
                 Undefined (T-Cell progenitors?) Megakaryocytes/Platelets
  A.GSM3396161                             0.248                    0.354
  B.GSM3396162                             0.070                    0.211
  C1.GSM3396163                            0.000                    0.046
  C2.GSM3396165                            0.053                    0.000
  Ck.GSM3396164                            0.108                    0.000
  E.GSM3396166                             0.000                    0.046
  F.GSM3396167                             0.419                    0.060
  G.GSM3396168                             0.074                    0.037
  H.GSM3396169                             0.138                    0.221
  J.GSM3396170                             0.034                    0.101
  K.GSM3396171                             0.271                    0.142
  L.GSM3396172                             0.942                    0.047
  M.GSM3396173                             0.034                    0.102
  N.GSM3396174                             0.232                    0.185
  O.GSM3396175                             0.234                    0.192
  P.GSM3396176                             0.030                    0.210
  Q.GSM3396177                             0.164                    0.164
  R.GSM3396178                             0.213                    0.035
  S1.GSM3396179                            1.502                    0.155
  S2.GSM3396181                            1.363                    0.303
  Sk1.GSM3396180                           0.000                    0.000
  Sk2.GSM3396182                           0.280                    0.259
  T.GSM3396183                             0.099                    0.222
  U.GSM3396184                             0.232                    0.207
  W.GSM3396185                             0.032                    0.193
                
                 Undefined (Duplets?)
  A.GSM3396161                  0.071
  B.GSM3396162                  0.070
  C1.GSM3396163                 0.137
  C2.GSM3396165                 0.105
  Ck.GSM3396164                 0.000
  E.GSM3396166                  0.092
  F.GSM3396167                  0.209
  G.GSM3396168                  0.000
  H.GSM3396169                  0.055
  J.GSM3396170                  0.000
  K.GSM3396171                  0.028
  L.GSM3396172                  0.071
  M.GSM3396173                  0.068
  N.GSM3396174                  0.046
  O.GSM3396175                  0.000
  P.GSM3396176                  0.030
  Q.GSM3396177                  0.082
  R.GSM3396178                  0.035
  S1.GSM3396179                 0.259
  S2.GSM3396181                 0.050
  Sk1.GSM3396180                0.000
  Sk2.GSM3396182                0.022
  T.GSM3396183                  0.049
  U.GSM3396184                  0.207
  W.GSM3396185                  0.064

and overall

options("digits"=2)
prop.table(table( bmmc_all$celltype))*100

              CD4 Naive and regulatory T-Cells 
                                        25.175 
                           CD4 Memory T-Cells  
                                        11.069 
                            CD8 Memory T-Cells 
                                         9.994 
                                CD14 Monocytes 
                                         9.089 
                                Mature B-Cells 
                                         6.066 
                             CD8 Naive T-cClls 
                                         4.933 
                                      NK Cells 
                                         4.877 
                     Late Erythroid Precursors 
                                         4.846 
                             Late Erythrocytes 
                                         6.085 
                    Early Erythroid Precursors 
                                         2.813 
                     Monocyte Progenitor Cells 
                                         2.386 
                                CD16 Monocytes 
                                         2.308 
                            B-Cell Progenitors 
                                         2.095 
Hematopoietic stem and progenitor cells (HSPC) 
                                         2.091 
                            Early Erythrocytes 
                                         2.056 
            Conventional Dendritic Cells (cDC) 
                                         1.795 
                         B-cells (Plasma cell) 
                                         1.037 
            Plasmacytoid dendritic cells (pDC) 
                                         0.795 
               Undefined (T-Cell progenitors?) 
                                         0.269 
                      Megakaryocytes/Platelets 
                                         0.154 
                          Undefined (Duplets?) 
                                         0.067 

The annotated reference dataset is stored

saveRDS(bmmc_all, file = "./StoredRObj/bmmc_Annot_Ref.rds")